Data Visualization for Education

SDP Fellow Workshop January 2013

Jared Knowles
Policy Research Advisor, Wisconsin DPI

Objectives

  1. Review data visualization principles
  2. Look at applications in education data
  3. Challenges in an LEA/SEA
  4. Best practices and advice
  5. What tools to use
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length
## Warning: number of items to replace is not a multiple of replacement
## length

Example

qplot(hp, mpg, data = mtcars) + theme_dpi()

plot of chunk plot

Principles

  • Elements of a chart
  • Chart Types and Data Types
  • Dimensionality
  • Scale
  • Complexity
  • Technical details
  • Beyond charts

plot of chunk plot1

Chart Elements

There are a few things that all charts need. There are sometimes strong cases to deviate from these, but they are good rules of thumb.

  • Axis labels and a chart title
    • These make the chart self-explanatory
  • A legend
    • What is the unit in the graphic?
  • A scale
    • How are units mapped to the visual space
  • Annotations
    • Author and data source (depending on distribution)

Data Types

Statistics we can use

Level of Meas. Stats
Nominal mode, Chi-squared
Ordinal median, percentile
Interval mean, std. deviation, correlation, ANOVA
Continuous geometric mean, harmonic mean, logarithms

Aesthetics for Mapping

Aesthetic Discrete Continuous
Color Disparate colors Sequential or divergent colors
Size Unique size for each value mapping to radius of value
Shape A shape for each value does not make sense

Ordered vs. Unordered

Mapping Possibilities

Aesthetic Ordered Unordered
Color Sequential or divergent colors Rainbow
Size Increasing or decreasing radius does not make sense
Shape does not make sense A shape for each value

Example

plot of chunk plot1.1plot of chunk plot1.1plot of chunk plot1.1plot of chunk plot1.1plot of chunk plot1.1plot of chunk plot1.1

Arbitrary image

Think like a map. Data density and easy interpretability.

Some tips

  • Focus on the content
  • Use best practices
  • Understand the limitations
  • Experiment and iterate!

Charting Data

  • The type of data we look at determines the way it should be presented
  • It always starts with the data
  • Let's review the data types
  • Categorical
  • Ordinal
  • Interval
  • Continuous

Charting Categorical Data

plot of chunk unnamed-chunk-1

Conditional 2

plot of chunk unnamed-chunk-2

Complexity

How do we display a ton of data--tens or hundreds of thousands of observations?

  1. Summarize the data
    • Display summary statistics visually depicting the central tendency and spread of data
  2. Plot the raw data
    • Annotate wisely to display the main message
  3. Model the data
    • Use a statistical model to summarize features of the data

Let's look at some examples of this.

Summarizing Data

  • The most simple summaries are measures of central tendency, most easily understood
  • It is important to look at the spread of data too though
  • If time is of interest, we are interested in trends

Plotting Means

plot of chunk plotmeans

But, what's wrong with this plot?

Mistakes

  • No sense of scale
  • Means can be skewed
  • Simple means are not meaningful
  • With assessment scores we need to know grade distribution
  • Let's try to improve this

plot of chunk plotmeanssmall

Try 2

plot of chunk meanplot2

With the same physical space, what additional information are we providing?

plot of chunk meanplot3

How can we do even better?

Annotation

We still aren't sure what the mean scale score means. Let's see a couple more additions that can realy make this useful.

plot of chunk meanplot4

Raw Data

Sometimes, we can get away with showing the raw data, that is, all data points. We may want to do this for a few reasons:

  • the "wow" effect,
  • because it is easier,
  • or because it looks better aesthetically.

How could it be done?

600,000 Observations Too Many

plot of chunk rawdata1

Spread the Data Out

  • Without reducing the data points we need to do three things to be successful
  1. Spread the data out
    • These points overlap each other and make a mess
  2. Reduce the ink
    • Each point has too much "weight"
  3. Add Reference Points
    • 600,000 observations in one panel is not meaningful

What About This

plot of chunk rawdata2

Even Smaller Multiples

plot of chunk rawdata3

Modeling the Data

All models are wrong. Some models are useful.

Regression Trees

Trees are ways to divide up the variation in a dataset and rank the explanatory values.

plot of chunk unnamed-chunk-3

Smoothers

plot of chunk unnamed-chunk-4

Model Results

Model Results II

Model Results III

We can combine these features.

  • Facets with smoother lines for references
  • Summary plots with raw data in the background
  • Reference lines and group comparisons

Some tips

  • Have a properly chosen format and design
  • Use words, numbers and drawing together
  • Reflect a balance, a proportion, relevant scale
  • Display an accessible complexity of details
  • Have a narrative quality, tell a story
  • Avoid content-free decoration (Tufte's proverbial chartjunk)
  • Draw in a professional manner with an eye on the technical details
  • Remember the map

Themes convey brands

qplot(hp, mpg, data = mtcars) + theme_economist()

plot of chunk plot2

They Also Communicate

qplot(hp, mpg, data = mtcars) + theme_tufte()

plot of chunk plot3

They Also Can Confound

qplot(hp, mpg, data = mtcars, color = factor(cyl)) + theme_excel2003() + scale_color_excel2003()

plot of chunk plot4

So Choose Wisely

qplot(hp, mpg, data = mtcars, color = factor(cyl)) + theme_stata()

plot of chunk plot5

Stacked Bar

Box and Whisker

Bullet Chart

Calendar

Lines

Parallel Coordinates

Parallel Sets

Streamgraph

Tree Map

Word Cloud

Graphics Types

Raster

  • Files like jpg , png , gif.
  • Fixed scale, aspect ratio, and size
  • Reasonable file size
  • Viewable in almost any image viewing and editing system, including any modern web browser, PowerPoint, etc.

Vector

  • Files like pdf and svg
  • Infinitely zoomable, adjustable on the fly
  • Larger file size
  • Viewable and usable in fewer systems. SVGs can be used in modern web browsers. PDFs included in other PDF reports.

Beyond Graphics

We have a number of other techniques we can use beyond simple charts.

  • Animations
  • Interactive demos
  • Summary tables
  • Videos
  • Web sites

Maps

Animations

Ugly graphic

Backmatter

print(sessionInfo(), locale = FALSE)
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

attached base packages:
 [1] stats4    grid      splines   stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] ggthemes_1.1.0    knitr_0.9         markdown_0.5.3   
 [4] whisker_0.1       slidify_0.3.3     devtools_0.8     
 [7] eeptools_0.1      mapproj_1.1-8.3   maps_2.2-8       
[10] proto_0.3-9.2     plyr_1.8          stringr_0.6.2    
[13] mgcv_1.7-22       Formula_1.1-0     partykit_0.1-4   
[16] party_1.0-3       vcd_1.2-13        colorspace_1.2-0 
[19] MASS_7.3-22       strucchange_1.4-7 sandwich_2.2-9   
[22] zoo_1.7-9         coin_1.0-21       mvtnorm_0.9-9993 
[25] modeltools_0.2-19 survival_2.37-2   ggplot2_0.9.3    

loaded via a namespace (and not attached):
 [1] dichromat_1.2-4    digest_0.6.0       evaluate_0.4.3    
 [4] formatR_0.7        gtable_0.1.2       httr_0.2          
 [7] labeling_0.1       lattice_0.20-10    Matrix_1.0-10     
[10] memoise_0.1        munsell_0.4        nlme_3.1-106      
[13] parallel_2.15.2    RColorBrewer_1.0-5 RCurl_1.95-3      
[16] reshape2_1.2.2     scales_0.2.3       tools_2.15.2      
[19] yaml_2.1.5        

References